Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More metrics for RF, RS and ROB #632

Merged
merged 15 commits into from
Apr 1, 2024
Merged

More metrics for RF, RS and ROB #632

merged 15 commits into from
Apr 1, 2024

Conversation

tilk
Copy link
Member

@tilk tilk commented Mar 27, 2024

This PR adds the following metrics:

  • Histogram of the number of used rows in RF, RS and ROB in each cycle.
  • Histogram of latency for instructions in RS.
  • Histogram of times registers are valid in RF before being freed.

Conclusions from reading the metrics:

  • ROB is currently very underused. Average ROB slots used is around 3-4, maximum is around 11-16. This will probably change with the introduction of jump prediction.
  • RS latency is low, typically 1-2 cycles.

The PR also adds two modules:

  • AsyncMemoryBank which basically wraps an Amaranth memory with asynchronous reads,
  • IndexedLatencyMeasurer which allows to measure latency for things that are not processed in FIFO fashion, but have unique indexes (like RF and RS entries).

TODO:

  • Documentation.
  • Tests.

@tilk tilk added the enhancement New feature or request label Mar 27, 2024
@tilk tilk marked this pull request as ready for review March 28, 2024 11:02
coreblocks/core_structs/rf.py Outdated Show resolved Hide resolved
coreblocks/func_blocks/fu/common/rs.py Outdated Show resolved Hide resolved

time = 0

def ticker():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use yield Now() instead of creating separate process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is written similar to to other metrics tests. I'm not going to do the change here without changing all the others. This is best left for a refactoring PR.

for _ in range(200):
if not free_slots:
yield
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cause that we can have less than 200 iterations. Was this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I guess changing that if to while should be enough.

Comment on lines 389 to 399
self.assertEqual(min(latencies), (yield m._dut.histogram.min.value))
self.assertEqual(max(latencies), (yield m._dut.histogram.max.value))
self.assertEqual(sum(latencies), (yield m._dut.histogram.sum.value))
self.assertEqual(len(latencies), (yield m._dut.histogram.count.value))

for i in range(m._dut.histogram.bucket_count):
bucket_start = 0 if i == 0 else 2 ** (i - 1)
bucket_end = 1e10 if i == m._dut.histogram.bucket_count - 1 else 2**i

count = sum(1 for x in latencies if bucket_start <= x < bucket_end)
self.assertEqual(count, (yield m._dut.histogram.buckets[i].value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do that common with FIFOLatencyMeasurer test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, maybe that's a refactor worth doing.

@@ -41,10 +43,13 @@ def __init__(self, gen_params: GenParams, func_units: Iterable[tuple[FuncUnit, s
Functional units to be used by this module.
rs_entries: int
Number of entries in RS.
rs_number: int
The number of this RS block. Used for debugging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If used for debugging, maybe there should be a default value? So that if anyone doesn't care about debug feature, then it doesn't have to pass that value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and I don't think this is a good thing. If rs_number is not set in a given CoreConfiguration then the metric will be hard to read, so the person doing the metrics will need to change the configuration.

Probably it would be better to auto-generate these numbers.

@tilk
Copy link
Member Author

tilk commented Mar 31, 2024

It seems like the test might still have a race. Those Settles are really a nightmare.

@tilk tilk merged commit f8add3c into master Apr 1, 2024
8 checks passed
@tilk tilk deleted the tilk/metrics-for-struct-usage branch April 1, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants